The advances in Artificial Intelligence are creating new opportunities to improve lives of people around the world, from business to healthcare, from lifestyle to education. For example, some systems profile the users using their demographic and behavioral characteristics to make certain domain-specific predictions. Often, such predictions impact the life of the user directly or indirectly (e.g., loan disbursement, determining insurance coverage, shortlisting applications, etc.). As a result, the concerns over such AI-enabled systems are also increasing. To address these concerns, such systems are mandated to be responsible i.e., transparent, fair, and explainable to developers and end-users. In this paper, we present ComplAI, a unique framework to enable, observe, analyze and quantify explainability, robustness, performance, fairness, and model behavior in drift scenarios, and to provide a single Trust Factor that evaluates different supervised Machine Learning models not just from their ability to make correct predictions but from overall responsibility perspective. The framework helps users to (a) connect their models and enable explanations, (b) assess and visualize different aspects of the model, such as robustness, drift susceptibility, and fairness, and (c) compare different models (from different model families or obtained through different hyperparameter settings) from an overall perspective thereby facilitating actionable recourse for improvement of the models. It is model agnostic and works with different supervised machine learning scenarios (i.e., Binary Classification, Multi-class Classification, and Regression) and frameworks. It can be seamlessly integrated with any ML life-cycle framework. Thus, this already deployed framework aims to unify critical aspects of Responsible AI systems for regulating the development process of such real systems.
translated by 谷歌翻译
Large "instruction-tuned" language models (finetuned to respond to instructions) have demonstrated a remarkable ability to generalize zero-shot to new tasks. Nevertheless, they depend heavily on human-written instruction data that is limited in quantity, diversity, and creativity, therefore hindering the generality of the tuned model. We introduce Self-Instruct, a framework for improving the instruction-following capabilities of pretrained language models by bootstrapping off its own generations. Our pipeline generates instruction, input, and output samples from a language model, then prunes them before using them to finetune the original model. Applying our method to vanilla GPT3, we demonstrate a 33% absolute improvement over the original model on Super-NaturalInstructions, on par with the performance of InstructGPT_001, which is trained with private user data and human annotations. For further evaluation, we curate a set of expert-written instructions for novel tasks, and show through human evaluation that tuning GPT3 with Self-Instruct outperforms using existing public instruction datasets by a large margin, leaving only a 5% absolute gap behind InstructGPT_001. Self-Instruct provides an almost annotation-free method for aligning pre-trained language models with instructions, and we release our large synthetic dataset to facilitate future studies on instruction tuning.
translated by 谷歌翻译
Drug targets are the main focus of drug discovery due to their key role in disease pathogenesis. Computational approaches are widely applied to drug development because of the increasing availability of biological molecular datasets. Popular generative approaches can create new drug molecules by learning the given molecule distributions. However, these approaches are mostly not for target-specific drug discovery. We developed an energy-based probabilistic model for computational target-specific drug discovery. Results show that our proposed TagMol can generate molecules with similar binding affinity scores as real molecules. GAT-based models showed faster and better learning relative to GCN baseline models.
translated by 谷歌翻译
Mobile health (mHealth) technologies empower patients to adopt/maintain healthy behaviors in their daily lives, by providing interventions (e.g. push notifications) tailored to the user's needs. In these settings, without intervention, human decision making may be impaired (e.g. valuing near term pleasure over own long term goals). In this work, we formalize this relationship with a framework in which the user optimizes a (potentially impaired) Markov Decision Process (MDP) and the mHealth agent intervenes on the user's MDP parameters. We show that different types of impairments imply different types of optimal intervention. We also provide analytical and empirical explorations of these differences.
translated by 谷歌翻译
在许多现实世界和高影响力决策设置中,从分类过程中说明预测性不确定性的神经网络的概率预测至关重要。但是,实际上,大多数数据集经过非稳定神经网络的培训,默认情况下,这些神经网络不会捕获这种固有的不确定性。这个众所周知的问题导致了事后校准程序的开发,例如PLATT缩放(Logistic),等渗和β校准,这将得分转化为校准良好的经验概率。校准方法的合理替代方法是使用贝叶斯神经网络,该网络直接建模预测分布。尽管它们已应用于图像和文本数据集,但在表格和小型数据制度中的采用有限。在本文中,我们证明了与校准神经网络相比,贝叶斯神经网络在各种数据集中进行实验,从而产生竞争性能。
translated by 谷歌翻译
从分布式敏感数据中学习隐私的模型是一个越来越重要的问题,通常在联邦学习环境中提出。最近通过分区的变异推理算法扩展到了非私有联盟学习设置。为了保护隐私,当前的黄金标准称为差异隐私。差异隐私在强大的数学上明确定义的意义上保证了隐私。在本文中,我们介绍了差异化的分区变异推断,这是学习与联合学习环境中贝叶斯后分布的差异近似的第一个通用框架,同时最大程度地减少了通信弹的数量并为数据主体提供差异隐私保证。我们在通用框架中提出了三个替代实现,一个基于单个方面的本地优化,而两个基于扰动全局更新(一种使用联合平均版本,一个将虚拟方添加到协议中),并比较其属性,并比较其属性理论上和经验。我们表明,只要各方都有足够的本地数据,扰动本地优化与简单且复杂的模型效果很好。但是,每个方始终独立保证隐私。相比之下,扰动全局更新与相对简单的模型最有效。鉴于可以访问合适的安全原始词,例如安全聚合或安全的改组,所有各方都可以共同保证隐私。
translated by 谷歌翻译
在回答问题时,人类会利用跨不同模式可用的信息来综合一致,完整的思想链(COT)。在深度学习模型(例如大规模语言模型)的情况下,这个过程通常是黑匣子。最近,科学问题基准已用于诊断AI系统的多跳推理能力和解释性。但是,现有数据集无法为答案提供注释,或仅限于仅文本模式,小尺度和有限的域多样性。为此,我们介绍了科学问题答案(SQA),这是一个新的基准,由〜21k的多模式多种选择问题组成,其中包含各种科学主题和答案的注释,并提供相应的讲座和解释。我们进一步设计语言模型,以学习将讲座和解释作为思想链(COT),以模仿回答SQA问题时的多跳上推理过程。 SQA在语言模型中展示了COT的实用性,因为COT将问题的答案绩效提高了1.20%的GPT-3和3.99%的unifiedqa。我们还探索了模型的上限,以通过喂食输入中的那些来利用解释;我们观察到它将GPT-3的少量性能提高了18.96%。我们的分析进一步表明,与人类类似的语言模型受益于解释,从较少的数据中学习并仅使用40%的数据实现相同的性能。
translated by 谷歌翻译
控制语言模型生成的文本并自定义内容一直是一个长期的挑战。追求提供控制的现有提示技术是特定于任务的,缺乏普遍性。这为非专家用户提供了压倒性的选择,可以找到适合其任务的方法。与这些技术相关的努力,例如在写作示例,解释,说明等。进一步限制了它们在非专家用户中的采用。在本文中,我们提出了一个简单的提示策略,可以帮助我思考我们在哪里鼓励GPT3通过提出一组相关问题并利用用户答案执行任务来帮助非专家用户。我们证明了我们的技术的功效,可以帮助我考虑各种任务。具体来说,我们专注于对普通人类很难的任务,需要进行重大思维才能执行。我们希望我们的工作将鼓励发展非常规的方式来利用大语模型的力量。
translated by 谷歌翻译
使用量子计算,本文解决了两个科学压迫和日常相关问题,即化学逆转录,这是半导体供应链的药物/材料发现和安全性的重要一步。我们表明,量子长短期内存(QLSTM)是逆转录合成的可行工具。我们使用QLSTM实现了65%的培训准确性,而经典的LSTM可以达到100%。但是,在测试中,我们使用QLSTM实现80%的精度,而经典LSTM仅以70%的精度达到峰值!我们还展示了量子神经网络(QNN)在硬件安全域中的应用,特别是使用一组功率和区域特洛伊木马功能在硬件特洛伊木马(HT)检测中。QNN模型可实现高达97.27%的检测准确性。
translated by 谷歌翻译
在这项工作中,我们证明了多种语的大规模序列到序列(SEQ2SEQ)模型,该模型是通过Denoising和因果语言建模(CLM)任务的混合物进行训练的,比仅解码器模型更有效地进行了效率的学习者在各种任务上。特别是,我们培训了一个名为Alexa教师模型(Alexatm 20b)的200亿个参数多语言SEQ2SEQ模型,并表明它在1-Shot摘要任务上实现了最先进的(SOTA)性能,超过了更大的540B PALM DOPODER模型。 Alexatm 20b还可以在1-Shot Machine翻译中实现SOTA,尤其是对于低资源语言,几乎所有语言对(阿拉伯语,英语,法语,德语,德语,印地语,意大利语,日语,以及flores-101数据集上的泰卢固语)。我们还显示了零拍设置,AlexATM 20B在SuperGlue和SqueadV2数据集上的表现优于GPT3(175B),并在XNLI,XCOPA,PAWS-X和XWINOGRAD等多语言任务上提供SOTA性能。总体而言,我们的结果为SEQ2SEQ模型提供了一个令人信服的案例,作为大型语言模型(LLM)培训的仅解码器模型的强大替代方法。
translated by 谷歌翻译